-
Notifications
You must be signed in to change notification settings - Fork 392
refactor(arrow-rs): migrate multi-key sort comparator to Arrow-rs #6087
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Greptile OverviewGreptile SummaryMigrated the multi-column sort comparator construction from Arrow2 to Arrow-rs in Key changes:
This is part of the staged Arrow2 → Arrow-rs migration effort and maintains backward compatibility while progressively modernizing the array backend. Confidence Score: 5/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant build_multi_array_bicompare
participant Series
participant ArrowRS as Arrow-rs Array
participant Converter as From Trait
participant Arrow2 as Arrow2 Array
participant Comparator as build_nulls_first_compare_with_nulls
Caller->>build_multi_array_bicompare: left, right, descending, nulls_first
loop For each (left[i], right[i]) pair
build_multi_array_bicompare->>Series: l.to_arrow()?
Series-->>build_multi_array_bicompare: l_arrow: ArrayRef (Arrow-rs)
build_multi_array_bicompare->>Series: r.to_arrow()?
Series-->>build_multi_array_bicompare: r_arrow: ArrayRef (Arrow-rs)
build_multi_array_bicompare->>Converter: Box::from(l_arrow.as_ref() as &dyn Array)
Converter->>ArrowRS: to_data()
ArrowRS-->>Converter: ArrayData
Converter->>Arrow2: from_data()
Arrow2-->>build_multi_array_bicompare: l_arrow2: Box<dyn daft_arrow::array::Array>
build_multi_array_bicompare->>Converter: Box::from(r_arrow.as_ref() as &dyn Array)
Converter->>ArrowRS: to_data()
ArrowRS-->>Converter: ArrayData
Converter->>Arrow2: from_data()
Arrow2-->>build_multi_array_bicompare: r_arrow2: Box<dyn daft_arrow::array::Array>
build_multi_array_bicompare->>Comparator: build_nulls_first_compare_with_nulls(l_arrow2, r_arrow2, desc, nf)
Comparator-->>build_multi_array_bicompare: DynComparator
build_multi_array_bicompare->>build_multi_array_bicompare: cmp_list.push(comparator)
end
build_multi_array_bicompare->>build_multi_array_bicompare: Create combined_comparator closure
Note over build_multi_array_bicompare: Iterates cmp_list, returns first non-Equal ordering
build_multi_array_bicompare-->>Caller: combined_comparator: DynComparator
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6087 +/- ##
==========================================
- Coverage 72.91% 72.90% -0.02%
==========================================
Files 973 973
Lines 126184 126200 +16
==========================================
- Hits 92011 92009 -2
- Misses 34173 34191 +18
🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No files reviewed, no comments
|
@universalmind303 help me review when you are convenient. Thanks |
universalmind303
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is exactly what we're looking for. This pr just replaces deprecated calls with other deprecated calls. Instead we need to rewrite the entire sort kernels to be arrow-rs.
|
Thanks for the review and the clarification! You're right – in its current form this PR is essentially just moving the call site from My initial goal here was to take a very small step by removing the direct Given your feedback, I'm happy to:
In the meantime, I'll mark this PR as draft so it doesn't block the proper Arrow-rs sort kernel rewrite. Before I start that work, could you confirm if you have a preferred target for the first “fully Arrow-rs” sort kernel (e.g. |
I don't think there's really a solid proposal or plan in place. What I'm really looking for in PR's is a fully self contained part of the codebase is moved over from arrow2 to arrow-rs. That could be a module, a single function, the entire kernel, or any other atomic unit of code. |
|
@huleilei, I added some context to the arrow2 migration discussion that is likely helpful for you |
Changes Made
Migrate the multi-column sort comparator construction path from Arrow2 to the Arrow-rs-based export, focusing specifically on
build_multi_array_bicompareinsrc/daft-core/src/array/ops/sort.rs.Refactor
build_multi_array_bicompareto remove direct reliance onSeries::to_arrow2().For each
(left[i], right[i])series pair:Use the Arrow-rs export path
Series::to_arrow()to obtain canonicalarrow::array::ArrayRefrepresentations.Bridge these Arrow-rs arrays into the legacy Arrow2
daft_arrow::array::Arrayrepresentation via the existingFrom<&dyn arrow_array::Array> for Box<dyn Array>conversion (behind thearrowfeature).Pass the bridged Arrow2 arrays into
kernels::search_sorted::build_nulls_first_compare_with_nullsto construct per-columnDynComparators that respect bothdescending(reversed sort direction) andnulls_first(null ordering) semantics.Keep the combined multi-column comparator logic unchanged: the returned comparator still iterates the per-column comparators and returns on the first non-
Equalordering.Related Issues
Part of Daft’s staged Arrow2 → Arrow-rs migration (#5741).